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ABSTRACT 

We describe a compression method for floating-point astronomical images that gives compres- 
sion ratios of 6 - 10 while still preserving the scientifically important information in the image. 
The pixel values are first preprocessed by quantizing them into scaled integer intensity levels, 
which removes some of the uncompressible noise in the image. The integers are then losslessly 
compressed using the fast and efficient Rice algorithm and stored in a portable FITS format 
file. Quantizing an image more coarsely gives greater image compression, but it also increases 
the noise and degrades the precision of the photometric and astrometric measurements in the 
quantized image. Dithering the pixel values during the quantization process greatly improves 
the precision of measurements in the more coarsely quantized images. We perform a series of 
experiments on both synthetic and real astronomical CCD images to quantitatively demonstrate 
that the magnitudes and positions of stars in the quantized images can be measured with the 
predicted amount of precision. In order to encourage wider use of these image compression meth- 
ods, we have made available a pair of general-purpose image compression programs, called fpack 
and funpack, which can be used to compress any FITS format image. 

Subject headings: image compression, FITS format 


1. Introduction 

The exponential growth in the volume of as- 
tronomical data being generated from ever larger 
format imaging detectors continues to drive up 
the data handling costs of both large and small 
telescope projects. The cost of archiving the im- 
ages is only one part of the problem. A typical 
observational work flow involves a long cascade 
of temporary and permanent copies of the data 
replicated from observatory to data processing fa- 
cility to deep storage site to science archive cen- 


ter to virtual observatory portal to multiple end 
users. Each science image provided as input to 
a pipeline will produce several output images as 
a result of processing operations such as resam- 
pling onto a standard grid, co-adding, mosaicking, 
and any analysis steps specific to the science pro- 
gram. The problem is one of throughput - not just 
storage - of the total system data flow. Network 
transmission costs also rival or exceed the cost of 
storage media and can be breathtakingly large for 
spacecraft or remote mountaintops. Often there is 
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an upper limit to the network bandwidth at any 
price. Making effective use of the available image 
compression technologies is an important compo- 
nent in dealing with these data handling costs. 

Lossy data compression techniques, which do 
not exactly preserve each image pixel value but 
do still preserve the required scientific information 
content of the image, are already used by many 
projects. To cite just a few examples, the Kepler 
space telescope project (Caldwell et al. 2010) uses 
a lossy image quantization method to reduce the 
data volume to fit within the limited bandwidth 
from its heliocentric orbit, the GONG helioseis- 
mology project (Harvey et al. 1996; Goodrich et 
al. 2004) relies on lossy compression to transmit 
images from its telescopes deployed at six locations 
worldwide, and the NOAO High-Performance 
Pipeline (Valdes k Swaters 2007) includes a final 
quantization step from an internal floating-point 
representation to the final integer archive data 
products to achieve greater image compression. 

In a previous contribution (Pence, Seaman, k 
White 2009, hereafter, Paper I), we demonstrated 
that the maximum possible lossless compression 
ratio for integer format astronomical images is de- 
termined by the amount of noise in the background 
pixels (i.e., the “sky”) in the image. The noise in 
astronomical images often has 2 main components: 
Poissonian-distributed photon noise and Gaussian 
distributed “read-out” noise. If each pixel is rep- 
resented by BITPIX bits (usually 16 or 32 bits) 
and if, on average, Mbits of those bits are filled 
with uncompressible, randomly fluctuating noise, 
then the maximum theoretical compression ratio 
for that image is given by BITPIX / In 

practice, no compression algorithm is 100% effi- 
cient, so the actual maximum compression ratio, 
R. is given by 

R = BITPIX/ (Mbits + K) (1) 

where K is an empirical measure of the efficiency 
(or overhead) of the algorithm in units of bits 
per pixel. We compared several lossless compres- 
sion algorithms and found that the Rice algorithm 
(Rice, Yeh. k Miller 1993; White k Becker 1998), 
which has a small I< value of about 1.2 bits per 
pixel, provided the best combination of speed and 
compression efficiency. 

The relationship between noise and entropy in 
images is discussed more fully in the appendix of 


Paper I, based on the seminal work by Shannon 
(1948), where we showed that the “equivalent” 
number of noise bits per pixel in an image can 
be calculated from the noise (<r) of the pixels in 
background regions of the image such that 

Mbits = log 2 (ffVl2) = log 2 (<7) + 1.792 (2) 

which, when combined with equation 1 , gives 

R — BITPIX/(log 2 (<7) + 1.792 + K) (3) 

In this current article we extend the previous 
analysis of integer images to study compression 
techniques for astronomical images in floating- 
point format. In the cases we are mainly con- 
cerned with here, these images originally had in- 
teger pixel values, but were converted into 32-bit 
IEEE floating-point format during the calibration 
processing (e.g., bias subtraction, flat-fielding, ab- 
solute flux calibration, etc.). Equation 3 applies 
to floating point images as well as integer im- 
ages, however typical floating-point astronomical 
images contain so much noise that it is impossible 
to achieve significant amounts of compression with 
lossless algorithms (often less than a factor of 2). 
One explanation for this excess noise is that a 32- 
bit IEEE floating-point number can express 7 deci- 
mal places of precision but this usually far exceeds 
the inherent precision of individual image pixel 
values. As a result many of the least significant 
bits in the mantissa of the floating-point pixel val- 
ues are effectively filled with quasi-random noise 
which is inherently uncompressible. While there 
are counter-examples of floating-point FITS arrays 
that have very little noise and can be losslessly 
compressed effectively (e.g., generated by theoret- 
ical simulations), these are not typical of the types 
of floating-point images commonly found in large 
astronomical data archives and are not the subject 
of this article. 

There are many published articles on loss- 
less floating-point data compression schemes (see 
Lindstrom k Isenburg 2006, and reference therein 
as recent examples) but it is beyond the scope here 
to summarize them in detail. The main point is 
that ultimately all of these lossless compression 
techniques face the same Shannon entropy limit, 
and the only way to achieve greater compression 
of noisy floating-point images is to use techniques 
that discard some of the noise. These methods 
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are technically “lossy” because they do not ex- 
actly preserve the pixel values, however, if only 
noise is discarded, then the compression can still 
be considered lossless from a scientific standpoint 
because all the useful information is retained. 

In the remainder of this article we describe a 
lossy compression technique for floating-point as- 
tronomical images that provides an optimal com- 
bination of speed, compression ratio, and preser- 
vation of information content. It is faster and 
achieves much higher compression than generic 
lossless file compression algorithms like GZIP 
(Gailly & Adler 1992), yet it produces no sig- 
nificant loss of information in the image when 
used appropriately. In §2 we describe in detail 
the quantization and compression techniques that 
have been implemented in our publicly available 
fpack and funpack image compression utility pro- 
grams. Then in §3 we describe the results of 
several experiments that demonstrate that astro- 
nomical floating-point images can be compressed 
by up to a factor of 10 without significant loss 
of astrometric or photometric precision. Finally, 
§4 summarizes the results and gives recommenda- 
tions for achieving the best compression of astro- 
nomical images. 

2. Quantization and Compression Meth- 
ods 

The most common lossy compression technique 
for floating-point images is to preprocess the pixel 
values by quantizing them into a smaller set of dis- 
crete values prior to applying a lossless compres- 
sion algorithm. In the simplest case, the values 
are rounded into a grid of equally spaced floating- 
point levels. This reduces the number of different 
bit patterns in the image pixels (i.e., reduces the 
entropy in the image) and improves the efficiency 
of file compression programs like GZIP which ac- 
cumulate a dictionary of the most common bit pat- 
terns in the file and represent them using a shorter 
code in the compressed file. Watson (2002) ap- 
plied an analogous technique to integer images. 
In order to accommodate compression algorithms, 
like Rice, that only operate on integer arrays, the 
quantized floating-point values are usually repre- 
sented by scaled integers so that the image pixel 


values are approximated by 

FloatValue = ScaleFactorx IntegerValue+ZeroPoint 

( 4 ) 

Note that this integer scaling technique was the 
only way to represent floating-point images in the 
FITS data format (Hanisch et al. 2001) before 
support for the IEEE floating-point format was 
officially added in 1990. 

As an aside, there are many articles in the liter- 
ature (e.g., Nieto-Santisteban et al. 1999; Gowen 
& Smith 2003; Nicula, Berghmans, & Hochedez 
2005; Seaman, White, & Pence 2009; Bernstein et 
al. 2010) that advocate using a square root scal- 
ing function, in part because the Poissonian shot 
noise scales by this same factor. What is usually 
not stated in these studies is that practically the 
same increase in compression that is obtained af- 
ter applying a square root scaling function can be 
obtained by linearly scaling all the pixels in the 
image by the same factor as is applied to the back- 
ground pixels during the square root scaling. Since 
the compression ratio is determined mainly by the 
noise in the background areas of the image (from 
equation 3), the amount of scaling that is applied 
to the relatively infrequent bright pixels usually 
makes little difference to the overall compression 
ratio of the image. In experiments on simulated 
astronomical images, Bernstein et al. (2010) found 
that using square-root scaling only produced sig- 
nificantly better compression than linear scaling 
when more than 10% of the image pixels are af- 
fected by isolated bright objects or cosmic rays. 

2.1. FITS Tiled Image Compression Con- 
vention 

White & Greenfield (1999) developed the tech- 
nique that forms the basis of the FITS tiled-image 
compression format that is used here with a few 
new refinements. Each row of the image (or in 
principle, any other rectangular ‘tile 5 in the image) 
is compressed separately to provide fast random 
access to individual sections of an image without 
having to uncompress the entire image. In the 
case of floating-point images, the pixel values are 
converted to integers with a scale factor that, by 
default, is proportional to the measured amount 
of noise in that tile. 

The noise in each tile of the image is calcu- 
lated using a robust algorithm that was originally 
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developed to measure the signal-to-noise in spec- 
troscopic data (Stoehr et al. 2007). In particular, 
we use their third order “Median Absolute Dif- 
ference” (MAD) formula to compute the standard 
deviation of the pixel values: 

a = 0.6052 x median(— Xt_ 2 + 2 x, — Xi+ 2 ) (5) 

where i is the vector index of the pixel within each 
row of the Image, and z* is the value of the i th 
pixel. The median value is computed over all the 
pixels in each row of the image. In the limiting 
case where the pixel values have a Gaussian dis- 
tribution, this formula converges to same value as 
the Standard Deviation of the pixels. Note that 
this formula is not affected by linear intensity gra- 
dients across the image (which are canceled out by 
the first and third terms), and the use of the me- 
dian makes the result insensitive to the presence 
of outlying large pixel values. For example, if one 
randomly sets 5% of the pixels in an image with 
Gaussian distributed noise to very large values to 
simulate the effects of cosmic rays, then the MAD 
noise estimate only increases by about 20%. 

Once the noise level has been calculated, the 
floating-point pixels are quantized into scaled in- 
tegers where the quantized levels are spaced at 
some user-specified fraction, q, of the noise, so 
that the spacing is given by cr/q. Normalizing the 
quantization spacing to cr is a convenient way to 
produce similar quality compressed images regard- 
less of the intrinsic noise level in the image. The 
scaled integers are then compressed using the de- 
fault Rice algorithm, or one of the other optional 
compression algorithms. Finally, the compressed 
stream of bytes is stored in a FITS binary table 
structure as defined in the FITS tiled image com- 
pression convention (Pence et al. 2000; Seaman et 
al. 2007). 

Since the noise a in the array of quantized in- 
tegers is simply equal to q, the expected image 
compression ratio, from equation 3. is given by 

R = BITPIX/ (log 2 (q) + 1.792 + K) (6) 

For example, quantizing an image using q values of 
1 or 4 will produce compression ratios of about 10 
or 6.4. respectively, when using the Rice algorithm 
that has K ~ 1.2. In most cases, the compression 
ratio does not depend very much on the distri- 
bution of objects or structures within the image 


itself. Note that this formula tends to break down 
for q values much less than 1 because many com- 
pression algorithms become less effective on very 
low entropy images and because the size of the 
FITS file header, which remains uncompressed, 
becomes relatively more significant. 

One important caveat with the use of this quan- 
tizing method is that if the noise level in the image 
is significantly overestimated for some reason (for 
our purposes it is generally sufficient if it is accu- 
rate to within about a factor of 2), then the im- 
age may be inadvertently quantized more coarsely 
than expected for a given q value, thus possibly 
causing a loss of information in the compressed 
image. Circumstances where the MAD algorithm 
could overestimate the noise include the following: 

• If a large fraction of the pixels in an im- 
age tile are covered by bright objects and 
have values (and noise) which is many times 
greater than in the fainter pixels, then the 
MAD noise estimate will be representative 
of those bright pixels and not that of the 
fainter “background” pixels. This may cause 
the fainter pixels to be more coarsely quan- 
tized than would be desired. 

• If a significant amount of the pixel-to-pixel 
variation in the image is real and not just 
due to noise, (e.g., if there are large pixel- 
to-pixel variations in the detector sensitiv- 
ity, or if the observed object contains signifi- 
cant structure on a pixel size scale), then the 
noise level may be overestimated. 

• If the noise in an image is anti-correlated 
between adjacent pixels (e.g., after certain 
types of image convolutions) then the noise 
will be overestimated. Note, however, that 
equation 5 is a function only of the values in 
every other pixel in each row of the image, so 
the anti-correlation scale length must extend 
over at least 2 pixels to affect the MAD noise 
estimate. 

Our fpack compression program (see §2.4) offers 
several options for dealing with this issue. One 
simple method is to just compress the image us- 
ing a larger q value to compensate for the pos- 
sible MAD noise overestimate, although this can 
negatively affect the overall compression ratio of 
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the image. Another option in cases where only 
a small fraction of the rows in an image might 
be affected is to compress the entire image as a 
single tile, rather than using the default row-by- 
row tiling pattern, so that the MAD noise estimate 
then more accurately reflects the noise in the back- 
ground regions of the image as a whole. Finally, 
fpack users do not need to rely on the MAD noise 
estimate at all, and instead can directly specify 
the desired spacing between the quantization lev- 
els. This latter option is especially appropriate 
for projects that generate large amounts of rela- 
tively homogeneous images because it ensures that 
all the images will be compressed using the same 
quantization factor. This method has the added 
advantage that it will improve the compression 
speed because it is not necessary to compute the 
MAD noise value in this case. 

2.2. Benefits of Dithering 

The effects of linear quantization are naturally 
greater in the fainter areas of an image than for the 
brighter pixels where the Poissonian uncertainty 
of the photon counts can be much larger than the 
quantized spacing. Measurement of the local ‘sky’ 
background around the image of a star or galaxy 
can be especially vulnerable to the effects of quan- 
tization, in part because it is usually necessary to 
identify and exclude pixels that are affected by 
other objects or by defects in the detector. If an 
image is coarsely quantized then most of the back- 
ground pixels will have a value equal to one of the 2 
quantization levels that bracket the true sky level. 
Any algorithm for rejecting outlying contaminated 
pixels will tend to reject more of the pixels that 
have the value that is further from the true sky 
level, and thus the sky level that is derived from 
the remaining pixels will tend to be biased towards 
the value of the quantized level that lies closest to 
the true sky level. 

The issue of detecting small amplitude signals 
in a quantized system is a well-studied problem 
in engineering and communications fields where 
the phenomenon is known as “stochastic reso- 
nance”. The somewhat counter-intuitive solution 
for improving the signal-to-noise of measurements 
of quantized data is to add a moderate amount 
of noise into the system. When applied to im- 
ages, this technique is commonly called “dither- 
ing”. Widrow & Kollar (2008) devote 2 chapters of 


their book to the theory and practice of dithering 
and recommend using a clever “subtractive dither- 
ing” technique, first proposed by Roberts (1962). 
This technique overcomes the drawback of having 
to add noise to the image: a dither is added to the 
quantizer input, and the same dither is subtracted 
again from the quantizer output. The dither thus 
behaves as a catalyst which makes the process 
work better but does not appear in the output im- 
age. We have adopted this technique when quan- 
tizing floating-point images by adding a random 
dither, Ri, with a value uniformly distributed be- 
tween 0 and 1 during the scaling process, 

Ii = round(Fi/ScaleFactor -f Ri — 0.5) (7) 

where Fi is the original floating-point value and I t 
is the the quantized integer value. The interest- 
ing trick that distinguishes subtractive dithering 
from ordinary dithering methods is that exactly 
the same random dither value is subtracted when 
converting back to the quantized float value: 

F, ~ (Ii + 0.5 - Ri) x ScaleFactor (8) 

The net effect of this subtractive dithering oper- 
ation is to shift the entire grid of linearly spaced 
intensity levels up or down by a random amount 
on a pixel by pixel basis. It should be noted that 
the dithered values are uniformly distributed be- 
tween the quantized levels in this implementation. 
One possible future enhancement may be to use a 
triangular or Gaussian dither (Widrow & Kollar 
2008) which may more closely replicate the actual 
distribution of pixel values in the image. 

In order to use this subtractive dithering 
method it is necessary to define a specific pseudo 
random number generator (PRNG) algorithm for 
use by both the compressor and the uncompres- 
sor so that the same predictable sequence of ran- 
dom numbers is used in both cases. We adopted 
the PRNG algorithm described by Park & Miller 
(1988) which has been shown to produce statis- 
tically independent random numbers uniformly 
distributed between 0 and 1. However, for prag- 
matic reasons we do not compute a unique random 
number for every single image pixel because (a) 
it would add significant computational overhead 
relative to the very efficient Rice compression al- 
gorithm, and (b) true randomness is not required 
for this purpose since even crude dithering pat- 
terns would still help to mitigate the measurement 
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biases in quantized images. Our compromise so- 
lution is to calculate a look-up-table of 10000 ran- 
dom numbers using the above mentioned PRNG, 
and then repeatedly recycle through this LUT 
when calculating the amount of dither for each 
pixel in the image. As a precaution against in- 
troducing regular cyclical noise patterns into the 
image, a pseudo random starting offset is com- 
puted each time when recycling through the LUT. 
Finally, 1 out of a possible 10000 initial seed val- 
ues for the entire dithering process is computed 
based on the system clock time (or optionally the 
checksum of the first row of pixel values in the 
image) to ensure that the same dithering pattern 
is not used in every image. This initial seed value 
is stored in the header of the compressed image 
for reuse when uncompressing the image. 

2.3. Quantization Noise 

While quantization reduces the entropy in the 
representation of the pixel values (thus improv- 
ing the image compression ratio), any quantization 
operation, even with subtractive dithering, modi- 
fies the pixel values and thus inherently increases 
the RMS noise in the pixel- to- pixel variations in 
the image by an amount given by: 

+ A 2 /12 (9) 

where the total variance in the quantized im- 
age is the quadrature sum of the variance cr§ in 
the original image plus the quantization noise vari- 
ance, and where A is the intensity spacing between 
the quantized levels. The factor 1/12 comes from 
the variance of a uniform random distribution with 
unit width (Janesick 2001, also see the appendix 
to Paper I). Substituting A = ao/q, the fractional 
increase in the noise caused by quantizing the im- 
age can be expressed as 

<x q /ao = s/1 + l/{12q 2 ) (10) 

Figure 1 shows how both the fractional noise ra- 
tio (from equation 10) and the compression ratio 
(from equation 6) depend on q. To foreshadow the 
results of the experiments that will be described 
in §3. this figure shows that q values in the range 
of 4 to 1 provide a good combination of high com- 
pression ratio with relatively little added noise. 

It should be cautioned that if a floating- 
point image is repeatedly compressed and uncom- 
pressed, then the cumulative quantization noise 



Fig. 1. — Relationship between the compression 
ratio and the fractional increase in the background 
noise, for a range of q quantization parameter val- 
ues. 

will be given by 

O-q/cro = V^TNJW) (11) 

where N is the number of compression and uncom- 
pression cycles. (This assumes that the dithering 
is re-randomized during each cycle, which is al- 
ways the case in our implementation of the sub- 
tractive dithering algorithm). The amplitude of 
this effect depends strongly on the q value. To 
put this in practical terms, an image can be com- 
pressed and uncompressed at least 16 times using 
q > 4 before the noise would increase to the same 
level equivalent to compressing the image once 
with q = 1. But if the image is compressed multi- 
ple times using q = 1, the noise would increase to 
scientifically unacceptable levels after just a few 
cycles. Ideally, an image should only be com- 
pressed once, and then all the subsequent data 
analysis should be performed directly on the tile- 
compressed FITS file. If the software cannot read 
the compressed format directly, then it should op- 
erate on an uncompressed version of the original 
compressed file, which then should not be recom- 
pressed. 
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2.4. fpack and funpack Compression Util- 
ity Programs 

In order to make the image quantization and 
compression techniques that are described in the 
previous sections more widely available to the 
astronomical community, we have developed a 
pair of general purpose utility programs, called 
fpack and funpack (Seaman et al. 2007), which 
can be used to compress and uncompress any 
FITS image, in integer or floating-point format. 
These utilities rely on the underlying CFITSIO 
library (Pence 1999) to perform the quantiza- 
tion and compression operations. The fpack 
and funpack utility programs were used in the 
experiments that are described in the follow- 
ing sections to quantize and compress the im- 
ages. Further information about fpack and fun- 
pack is available from the HE AS ARC web site at 
http: / /heasarc.gsfc.nasa.gov/ fitsio/ fpack/. 

3. Experiments on Quantized Images 

In this section we present the results of experi- 
ments designed to show how the measurements of 
objects in an image are affected as the image is 
quantized by varying degrees. In particular, we 
will verify that the noise in the image increases as 
a function of q by the amount predicted by equa- 
tion 10, and more significantly, that the statisti- 
cal errors on the magnitude and position measure- 
ments of faint objects in an image, which are lim- 
ited mainly by the background noise, also increase 
by a similar factor. These results will provide gen- 
eral guidelines for achieving the greatest amount 
of image compression while still preserving the re- 
quired level of scientific precision in the image. 

Section 3.1 describes the method of construct- 
ing the simulated CCD images that are used in 
the first 2 experiments. The first experiment, in 
§3.2, examines how quantization affects the uncer- 
tainties of measurements of single star images, and 
the second experiment, in §3.3, examines the case 
where many quantized images are added together 
to detect sources far below the detection threshold 
of a single image. Finally, the third experiment, 
in §3.4, is performed on a set of actual astronomi- 
cal images to verify the results obtained from the 
synthetic images. 


3.1. Data Model and Measurement Meth- 
ods 

In order to determine how quantization affects 
the precision of measurements of objects in an im- 
age, we generated a large sample of realistic CCD 
star images with known input positions and mag- 
nitudes. This allows us to precisely calculate the 
errors on the measured positions and magnitudes 
in the quantized images. All the stars have circu- 
lar Gaussian profiles with a = 1.0 and FWHM = 
2.35 pixels. This is typical of the spatial resolution 
commonly found in astronomical CCD images and 
is adequate to avoid the difficulties when analyz- 
ing spatially undersampled images. The central 
location of the star images, relative to the pixel 
grid, was varied so as to average out any subtle 
biases in the subsequent star detection and mea- 
surement steps that might depend on the exact 
position. The total integrated flux in the stars cov- 
ered a range of 10 magnitudes (a factor of 10000 
in intensity) in 0.5 magnitude increments. Finally, 
the sky background was simulated by adding 1000 
counts to each pixel. 

Poissonian-distributed shot noise and Gaussian 
distributed “read-out” noise was then added to 
each of these star images to simulate real CCD 
images. The shot noise in each pixel was ran- 
domly calculated using a a equal to the square 
root of that pixel value (which implicitly assumes 
that the “gain” of the simulated CCD has been 
set to 1 electron per analog-to-digital readout 
count), and the readout noise was calculated us- 
ing a — 10. The read-out noise in these images 
is relatively small compared to the shot noise in 
the sky background, winch is usually the case 
for real astronomical CCD images that have a 
moderately bright background level. The total 
noise in the background areas of these images has 
a = \/l000 + 10 2 = 33.2. Different starting ran- 
dom seed values were used so that the actual iroise 
distribution varies in every image. 

The widely used SExtractor source extraction 
program (Bertin & Arnouts 1996) was employed 
to objectively detect and measure the position and 
magnitude of the stars in these simulated, noisy 
CCD images. SExtractor calculated the positions 
(given by the XJMAGE and YJMAGE output 
parameters) from the flux-weighted centroid of all 
the pixels in each star image above an empirically 
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determined flux detection threshold. The total 
magnitude of each star (given by the MAG_APER 
parameter) was measured within a 7-pixel diame- 
ter aperture (i.e., out to 3.5 a of the Gaussian pro- 
file) around the centroid position. We calculated 
the actual errors on these measurements from the 
differences between the known input position and 
flux of each star in the synthetic image. For ref- 
erence, the faintest stars that SExtractor could 
reliably detect in these images contained about 
1000 net counts, which is coincidentally equal to 
the mean sky counts in each pixel. We arbitrarily 
set the zero point of the instrumental magnitude 
scale, 

to = — 2.51og(flux) + ZeroPoint (12) 

so that these faintest detected stars have a mag- 
nitude of 20.0. 

We repeated the SExtractor measurements of 
the simulated CCD images after quantizing and 
compressing each image with fpack using q val- 
ues ranging from 8 to 0.25, both with and with- 
out the subtractive dithering option. Since the 
SExtractor program cannot directly read images 
in the compressed FITS format, it was necessary 
to convert them back into the standard FITS im- 
age format using funpack (but the pixel values 
remain quantized of course). In the case where 
dithering was not applied, we also selected the 
option to compress the entire image as a single 
large tile so that all the pixels are quantized into 
the same fixed grid of intensity values. If we had 
used the default row-by-row tiling option instead, 
it effectively would have introduced some dither- 
ing of the quantized levels between adjacent rows, 
and the results would be intermediate between the 
dithered and non-dithered cases presented below. 

Figure 2 shows the dramatic effect that subtrac- 
tive dithering has on the histogram of the pixel 
values in a q = 1 quantized image; the dithered 
image histogram is nearly identical to the that of 
the original image, whereas the non-dithered his- 
togram clearly shows the coarse intensity binning. 
Without dithering, the image breaks up into dis- 
crete “bands” of constant intensity which makes it 
more difficult to detect faint features in the image. 
This effect is shown in Figure 3, (using an even 
more coarsely quantized q = 0.5 image to enhance 
the effect), where the majority of the background 
pixels all have the same (medium gray) intensity 



Fig. 2. — Histograms of the pixel intensity dis- 
tribution in one of our test images which demon- 
strates the beneficial effects of subtractive dither- 
ing when quantizing the image with q = 1. When 
dithering is applied (the finely stepped histogram), 
the pixel distribution closely matches that in the 
original image (the continuous curve). When 
dithering is not applied (the coarsely stepped his- 
togram) then subtle gradations in image intensity 
are lost. 



Fig. 3. — Images of a pair of faint (m = 20) ar- 
tificial stars (left panel) and the same image after 
quantizing the image with q = 0.5 with dither- 
ing (middle panel) and without dithering (right 
panel). Without dithering, the background breaks 
up into broad patches of constant intensity level, 
and the star images become harder to detect. 
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value. 


3.2. Experiment 1: Single Images 

In this first experiment, we examine how the 
uncertainties of the photometric and astrometric 
measurements of individual stars depend on the 
q quantization factor. To measure this effect, we 
computed the Standard Deviation of the magni- 
tude and position errors (i.e., the value computed 
by SExtractor minus the known input magnitude 
or position value) for 6250 simulated star images 
at each 0.5 magnitude increment. Figure 4 shows 
how the magnitude uncertainties decrease as the 
brightness of the star increases (towards the right). 
The lower, thicker line in the figure was derived 
from the original, unquantized images and shows 
that the statistical uncertainty on the magnitudes 
decreases from <r = 0.23 for the faintest detectable 
stars (with m = 20), to a = 0.018 for stars that 
are 3 magnitudes brighter. The line has a slope 
close to 1.0 (in log - log coordinates) as expected 
in the limiting case where the noise is dominated 
by the sky, and hence the signal-to-noise ratio of 
the measurement increases in direct proportion to 
the signal. 

The other 3 lines in Figure 4 were derived from 
the same images after they were first quantized 
and compressed with successively coarser q values 
of 1.0, 0.5, and 0.25, when applying the subtrac- 
tive dithering option. The vertical displacement 
of these lines shows that the measurement errors 
are systematically larger in the quantized images 
as compared to the measurements in the original 
image. This increase is relatively small for q > 
1, but increases rapidly for coarser quantization 
values. 

Figure 5 shows a similar plot of the Standard 
Deviation of the stellar centroid measurements, in 
units of pixels, as a function of the magnitude of 
the star and the q quantization factor. In the origi- 
nal unquantized images, the positional uncertainty 
decreases from <r = 0.24 pixels for the m = 20 
stars, to a = 0.035 pixels for the stars that are 3 
magnitudes brighter. The positional measurement 
uncertainties are larger in the quantized image by 
about the same factor as the increase in the mag- 
nitude uncertainties shown in Figure 4. 

In order to better quantify the effects shown 
in these 2 figures, Table 1 summarizes how the 



Fig. 4. — The effect of q on the measured magni- 
tude uncertainties when dithering is applied. 
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Fig. 5. — The effect of q on the measured posi- 
tional uncertainties when dithering is applied. 
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noise and the measurement uncertainties increase 
as the images are quantized more coarsely. The 
percentage increase in the measured MAD noise 
level in the quantized images is given in column 2 
and agrees exactly with the predicted value from 
equation 10. More strikingly, the magnitude and 
position uncertainties for the faintest stars also in- 
crease by about the same factor as the noise, as 
shown in columns 3 and 4. This means that equa- 
tion 10 also provides a good estimate of how much 
the measurement uncertainties of the faintest ob- 
jects in an image, which are limited by the back- 
ground noise, will increase when the image is 
quantized. 

Finally, columns 5 and 6 in Table 1 give the 
corresponding increase in measurement uncertain- 
ties for stars that are 5 magnitudes, or a factor 
of 100, brighter than the image detection thresh- 
old. As expected, these brighter objects are rel- 
atively less affected by quantization because the 
inherent Poissonian noise in the brighter pixels is 
larger than the spacing between the quantized lev- 
els. Also, the small formal statistical errors on 
the magnitudes and positions of the brighter stars 
(which are less than 0.001 of a magnitude or pixel, 
respectively, in this case) are often insignificant 
compared to the systematic errors in the absolute 
calibration of the measurements. 

For comparison, we also measured the posi- 
tion and magnitude uncertainties in images that 
were quantized but without subtractive dithering. 
These results are shown in Figures 6 and 7. ft is 
apparent from comparison with Figures 4 and 5 
that the uncertainties in the undithered quantized 
images are much larger than in the dithered case. 
In order to achieve the same level of photometric 
precision in this test, the undithered images must 
be quantized by about a factor of 10 finer spacing 
than in the dithered case, which greatly reduces 
the resulting compression ratio of the image. As 
was discussed previously in §2, one reason for this 
difference is that dithering the pixel values has a 
large effect on the measured background sky in- 
tensity level. This is illustrated in Figure 8, which 
shows the average difference between the magni- 
tudes measured in the original image and in a q 
— 0.5 quantized version of the same image. In the 
non-dithered case, even stars that are 3-4 mag- 
nitudes brighter than the detection threshold are 
still significantly biased. Note that the amount of 



Fig. 6. — Same as Figure 4 but without dithering. 
The magnitude uncertainties are much larger than 
in the dithered case. 
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Fig. 7. — Same as Figure 5 but without dithering. 
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Table 1 

Increased noise and measurement uncertainties as function of q 


q 

O'q/'T'O 

Amag 2 

Apos 2 

Amag 5 

Apos 5 

4 

0.26% 

0.31% 

0.18% 

0.10% 

0.20% 

2 

1.04% 

1.1% 

0.93% 

0.25% 

0.83% 

1 

4.08% 

5.6% 

4.1% 

1.7% 

3.4% 

0.5 

15.5% 

19% 

15% 

5.8% 

12% 


bias shown here depends on how closely the near- 
est quantized level happens to lie to the true sky 
level in the image. 

The benefits of dithering are less pronounced 
on the astrometric measurements, as can be seen 
by comparing Figures 5 and 7. The non-dithered 
image only needs to be quantized with about a 
factor of 2 finer spacing to give the same positional 
precision as in the dithered image. 

3.3. Experiment 2: Co-added Images 

The second experiment investigates how well 
faint star images that are below the detection 
threshold of a single image can be measured af- 
ter co-adding many images together. To first or- 
der, one would expect that the noise in the co- 
added image will decrease by a factor equal to 
the square root of the number of co-added im- 
ages. For this experiment we co-added 250 star im- 
ages, similar to those generated in the first exper- 
iment, which should allow detection of stars that 
are about 2.5 log \/250 = 3.0 magnitudes fainter 
than in the single images. We repeated this co- 
adding experiment multiple times, using different 
starting seed values for the random background 
noise distribution, to generate a total of 250 simu- 
lated co-added star images in each of the 0.5 mag- 
nitude bins. The magnitude and position of each 
of these co-added stars were then measured with 
the SExtractor program, just as in the first exper- 
iment. 

Figure 9 shows the results of the photometric 
measurements in the co-added images. Other than 
the fact that the lines in the plots exhibit more 
statistical scatter, since they were derived from a 
sample of only 250 star measurements in each 0.5 
magnitude bin instead of 6250, these results are 
very similar to the results from the single images 



20 19 18 17 16 


Magnitude 

Fig. 8. — Mean errors on the measured magni- 
tudes of the fainter stars when coarsely quantizing 
an image with q = 0.5, with and without dither- 
ing. The large errors in the non-dithered case are 
caused by systematic biases in the intensity mea- 
surements of the sky background. The amplitude 
of the errors shown here are only representative 
of one specific case and depend on how closely 
the nearest quantized background level (without 
dithering) happens to match the true sky inten- 
sity level in the image. 
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Fig. 9. — The effect of q on the measured magni- 
tude uncertainties in the sum of 250 images, with 
dithering. 



Fig. 10. — Same as Figure 9, but without dither- 
ing. Dithering the 250 individual images has little 
effect on the precision of the magnitude measure- 
ments in the summed image. 


shown in Figure 4, after allowing for the expected 
shift of the horizontal axis by 3 magnitudes. This 
similarity confirms that the photometric measure- 
ments in the co-added images have the same de- 
pendence on q as in the individual images. In par- 
ticular, it confirms that information about faint 
stars that are well below the detection threshold 
in a single image is still preserved in the quantized 
images and can be detected after co-adding many 
quantized images together. 

We also confirmed that the astrometric preci- 
sion in the co-added images shows the same de- 
pendence on q as the measurements in the single 
images. A plot of the positional uncertainty as a 
function of magnitude (not shown here) is almost 
identical to Figure 5, except for the 3 magnitude 
shift of the horizontal axis. 

Finally, Figure 10 shows the magnitude uncer- 
tainties when the quantized images are not sub- 
tractively dithered before co-adding them. Unlike 
the case for single images (as compared in Figures 
4 and 6), dithering has little effect on the pre- 
cision of the photometry in the co-added image. 
This is because the co-adding process effectively 
introduces its own form of pixel dithering if the 
quantized levels in the different images are ran- 
domly offset with respect to each other. Thus, 
there is little added benefit by also dithering the 
pixel values within each image. Price- Whelan & 
Hogg (2010) demonstrate an even more extreme 
case where accurate photometry and astrometry 
could be performed on a co-added sum of 1024 
images, each quantized with q = 1/16 = 0.0625 
without any dithering. 
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Fig. 11. — Sample region in one of the test im- 
ages taken with the Isaac Newton Telescope at La 
Palma. This shows about 10% of the total image 
area. 

3.4. Experiment 3: Real Astronomical 

Images 

In the third and final experiment we performed 
tests on a set of real astronomical CCD images 
to confirm the previous results on synthetic im- 
ages. We used a sequence of 20 similar CCD 
images that show a random distribution of faint 
stars and galaxies taken with the 2.5-m Isaac 
Newton Telescope at La Palma (Figure 11). All 
the exposures were for 600 seconds through a V- 
band filter with the same pointing on the sky 
(12 h ol m , 26°24'). These images are publicly avail- 
able through the virtual observatory portal at 
http://portal-nvo.noao.edu. 

We performed 2 tests to measure the effects of 
quantization on these images. In the first test we 
quantized one of the images using q = 1 (with sub- 
tractive dithering), by compressing it with fpack 
and then uncompressing it again with / unpack . 
As predicted by equation 6, the compression ratio 
achieved by fpack on these images depends almost 
entirely on the q value that is used. We then com- 
pared the SExtractor magnitude measurements in 
the original image to those derived from the quan- 
tized image. Since we do not know the true mag- 
nitudes of the stars in this image (unlike in the 
experiments on the synthetic stars), we can only 
compare the measurements of the stars in the 2 


images, both of which have measurement uncer- 
tainties. The background noise increased by 4.1%, 
from a = 22.79 in the original image to a = 23.73 
in the quantized image, as predicted by equation 
10. Also as expected, the statistical errors (as cal- 
culated by SExtractor) on the magnitude measure- 
ments in the quantized image increased by a sim- 
ilar amount, 3.8% on average, over the same mea- 
surements in the original image. The top panel of 
Figure 12 shows the difference between the 2 mag- 
nitude measurements for each star as a function of 
the magnitude of the star. The lower panel shows 
that the corresponding relative errors (the mag- 
nitude difference divided by the statistical error 
on that magnitude as calculated by SExtractor). 
The RMS value of all the relative errors is 0.37cr; 
The fact that this is much less than the » lcr that 
one would expect when comparing the magnitudes 
derived from 2 identical CCD images of the same 
stars, confirms that quantizing the image with q 
= 1 (which results in a compression ratio of 10) 
has not introduced statistically significant photo- 
metric errors. 

In the second test we co-added the 20 CCD 
images, before and after quantizing them with q 
= 1, and then calculated the differences in the 
measured magnitudes in the 2 co-added images. 
The results, shown in Figure 13, are similar to 
those in Figure 12, derived from a single image, 
except that the faintest detected stars are now 
2.51ogV20 — 1.6 mag fainter, as expected from 
co-adding 20 images. The RMS value of the rel- 
ative errors shown in the lower panel is 0.34<r in 
this case, which again demonstrates that quantiz- 
ing the images with q = 1 has not produced any 
significant photometric errors, even in objects that 
are fainter than the detection threshold in a single 
image. 

4. Summary and Discussion 

The main purpose of this work has been to find 
a more effective compression method for floating- 
point astronomical images than is provided by the 
lossless methods that are commonly used (such 
as GZIP). These floating-point images typically 
do not compress well with lossless algorithms be- 
cause a large fraction of the bits in each pixel 
value representation contain no significant infor- 
mation and are effectively filled with uncompress- 
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ible noise. We have adopted the method of elim- 
inating some of this noise by quantizing the pixel 
values into a set of discrete, linearly spaced inten- 
sity levels, which are represented by scaled integer 
values. The scaled integers can then be efficiently 
compressed using the very fast Rice algorithm. In 
order to make these compression techniques more 
widely available, we have produced a pair of util- 
ity programs called fpack and f 'unpack that can be 
used to compress any FITS format image. 

For convenience, we define a quantization pa- 
rameter, q, which is equal to the measured RMS 
noise in background regions of the image divided 
by the spacing between the quantized intensity 
levels. Given a particular q value when quantizing 
and compressing an image, one can calculate from 
fundamental principles the expected image com- 
pression ratio, as given by equation 6. Coarser 
quantization (i.e., smaller q values) gives greater 
image compression, but at the same time it in- 
creases the RMS pixel-to-pixel noise by an amount 
given by equation 10 which also tends to degrade 
the precision of the magnitude and position mea- 
surements of the objects in the image. 

Our series of experiments on simulated and on 
real astronomical CCD images demonstrate that 
the noise equation 10 also gives a good estimate 
of the increase in measurement uncertainties for 
objects near the detection threshold in a quan- 
tized image (which are noise limited). For many 
practical applications, q values between 1 and 4 
provide a good combination of high compression 
and low increased noise: q = 1 gives a compres- 
sion ratio of about 10 while increasing the noise 
and the measurement uncertainties by about 4%, 
and q = 4 gives a compression ratio of 6 with a 
negligible 0.26% increase in the noise and mea- 
surement uncertainties. In ‘quick-look’ types of 
applications, where high scientific accuracy is not 
of primary importance, even greater compression 
can be obtained by using q values less than 1. 

One necessary requirement for achieving these 
results, however, is that the pixel values must be 
dithered during the quantization process to im- 
prove the signal-to-noise in measurements of faint 
signals in the image. Without dithering, the faint 
areas of coarsely quantized images tend break up 
into broad plateaus of constant intensity, and it 
becomes more difficult to detect the faintest ob- 
jects in the image. In astronomical images this ef- 
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Fig. 12. — Differences in star magnitudes mea- 
sured in a CCD image and aq = l quantized ver- 
sion of the same image. The upper panel shows 
the absolute magnitude differences, and the lower 
panel shows the relative differences after dividing 
by the statistical error. 
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Fig. 13. — Same as Figure 12, except the magni- 
tudes are measured in the sum of 20 CCD images 
and in the sum of the q = 1 quantized version of 
each of the images. 
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feet can seriously impact intensity measurements 
of the local sky background which can lead to 
systematic biases in photometric measurements in 
the quantized image. In our experiments using 
SExtractor, a dithered image can be quantized by 
about a factor of 10 coarser than a non-dithered 
image (thus resulting in a much higher compres- 
sion ratio) while still preserving the same amount 
of photometric precision. 

Several other recent studies have reached sim- 
ilar conclusions about the benefits quantizing as- 
tronomical images to reduce the amount of noise 
and improve the compression ratio. Price- Whelan 
k Hogg (2010) performed photometric and astro- 
metric measurements on stars in simulated CCD 
images (using a slightly different methodology 
than used here) and concluded that no significant 
errors were introduced if the image is quantized 
(without dithering) with q > 2. Bernstein et al. 
(2010) found that using q = 1, along with square- 
root scaling does not induce any significant bias 
in weak-lensing shape measurements of galaxies 
(i.e., the second statistical moment of the image) 
in simulated images from future space-based imag- 
ing missions. Caldwell et al. (2010) describe how 
the Kepler mission is performing on-board quan- 
tization of the images with q values as low as 1.15 
to obtain the necessary amount of compression of 
the downlinked telemetry while still preserving the 
required high precision in the differential photom- 
etry measurements of stars. While we recognize 
that applying any sort of lossy compression scheme 
to scientific data tends to go against the inclina- 
tion of scientists and data archive professionals, 
hopefully the results of the experiments performed 
here and in the other cited references will help to 
alleviate these concerns. 

While we believe our test results should be ap- 
plicable to a wide range of floating-point astro- 
nomical images, we strongly encourage users to 
perform their own tests to verify that this com- 
pression technique is appropriate for their own 
data. Users can easier create a quantized and 
dithered version of any floating- point FITS image 
by compressing it with fpack and then uncompress- 
ing it with /unpack. This image can then then be 
processed just like the original image to see if there 
are any significant differences in the results. 
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